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1 Special issue: Game-play ing programs: theory and practice 
M. A. Bramer 

April 1982 ACM SIGART Bulletin, issue 80 
Publisher: ACM Press 

Full text available: ^ pdf(9.23 MB) Additional Information: full citation , abstract 

This collection of articles has been brought together to provide SIGART members with an 
overview of Artificial Intelligence approaches to constructing game-playing programs. 
Papers on both theory and practice are included. 



2 Web Site Analysis: Statistical profiles of highly-rated web sites Q 
&w Melody Y. Ivory, Marti A. Hearst 

April 2002 Proceedings of the SIGCHI conference on Human factors in computing 
systems: Changing our world, changing ourselves CHI '02 

Publisher: ACM Press 

Full text available: "g pdf(!78 MB) Additional Information: full citation, abstract , references , citings, index 

We are creating an interactive tool to help non-professional web site builders create high 
quality designs. We have previously reported that quantitative measures of web page 
structure can predict whether a site will be highly or poorly rated by experts, with 
accuracies ranging from 67—80%. In this paper we extend that work in several ways. 
First, we compute a much larger set of measures (157 versus 11), over a much larger 
collection of pages (5300 vs. 1900), achieving much higher overall accur ... 

Keywords: World Wide Web, automated usability evaluation, empirical studies, web site 
design 



3 eTuner: tuning schema matching software using synthetic scenarios Q 
Yoonkyong Lee, Mayssam Sayyadian, AnHai Doan, Arnon S. Rosenthal 
October 2006 The VLDB Journal — The International Journal on Very Large Data 

Bases, Volume 16 Issue 1 
Publisher: Springer-Verlag New York, Inc. 

Full text available: ^ S|pdf(1.01 MB) Additional Information: full citation , abstract , index terms 

Most recent schema matching systems assemble multiple components, each employing a 
particular matching technique. The domain user mustthen tune the system: select the 
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right component to be executed and correctly adjust their numerous "knobs" (e.g., 
thresholds, formula coefficients). Tuning is skill and time intensive, but (as we show) 
without it the matching accuracy is significantly inferior. We describe eTuner, an approach 
to automatically tune schema matchin ... 

Keywords: Compositional approach, Machine learning, Schema matching, Synthetic 
schemas, Tuning 



4 Using a mixture of probabilistic decision trees for direct prediction of protein function Q 
Umar Syed, Golan Yona 

April 2003 Proceedings of the seventh annual international conference on Research 
in computational molecular biology RECOMB v 03 

Publisher: ACM Press 

Full text available: ^ pdf(306.22 KB) Additional Information: full citation , abstract , references , index terms 

We study the direct relationship between basic protein properties and their function. Our 
goal is to develop a new tool for functional prediction that can be used to complement and 
support other techniques based on sequence or structure information. In order to define 
this new measure of similarity between proteins we collected a set of 453 features and 
properties that characterize proteins and are believed to be correlated and related to 
structural and functional aspects of proteins. Among thes ... 

Keywords: decision trees, functional prediction, sequence-function relationships 




Pre positional Satisfiability and Constraint Programming: A comparative survey j 

Lucas Bordeaux, Youssef Hamadi, Lintao Zhang 

December 2006 ACM Computing Surveys (CSUR), volume 38 issue 4 

Publisher: ACM Press 

Full text available: ^ pdf(878.93 KB) Additional Information: fufl citation , abstract , references , index terms 

Propositional Satisfiability (SAT) and Constraint Programming (CP) have developed as two 
relatively independent threads of research cross-fertilizing occasionally. These two 
approaches to problem solving have a lot in common as evidenced by similar ideas 
underlying the branch and prune algorithms that are most successful at solving both kinds 
of problems. They also exhibit differences in the way they are used to state and solve 
problems since SAT's approach is, in general, a black-box approach, ... 

Keywords: SAT, Search, constraint satisfaction 



6 Shape-based retrieval and analysis of 3D models 
Thomas Funkhouser, Michael Kazhdan 

August 2004 ACM SIGGRAPH 2004 Course Notes SIGGRAPH '04 
Publisher: ACM Press 

Full text available: ^|pdf(12.56 MB) Additional Information: full citation , abstract 

Large repositories of 3D data are rapidly becoming available in several fields, including 
mechanical CAD, molecular biology, and computer graphics. As the number of 3D models 
grows, there is an increasing need for computer algorithms to help people find the 
interesting ones and discover relationships between them. Unfortunately, traditional text- 
based search techniques are not always effective for 3D models, especially when queries 
are geometric in nature (e.g., find me objects that fit into thi ... 
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Johannes Gehrke, Wie-Yin Loh, Raghu Ramakrishnan 

August 1999 Tutorial notes of the fifth ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '99 

Publisher: ACM Press 

Full text available: fS| pdf(Z95 MB) Additional Information: full citation , abstract, references , cjtings, index 

terms 

With over 800 million pages covering most areas of human endeavor, the World-wide 
Web is a fertile ground for data mining research to make a difference to the effectiveness 
of information search. Today, Web surfers access the Web through two dominant 
interfaces clicking on hyperlinks and searching via keyword queries This process is often 
tentative and unsatisfactory Better support is needed for expressing one's information 
need and dealing with a search result in more structured ways than ... 

A survey of Web metrics 

Devanshu Dhyani, Wee Keong Ng, Sourav S. Bhowmick 

December 2002 ACM Computing Surveys (CSUR), Volume 34 issue 4 

Publisher: ACM Press 

Full text available- 111 pdf(289 28 KB) Add ' ^ ' ona, Information: full citation , abstract , references , citings , index 
y^i terms 

The unabated growth and increasing significance of the World Wide Web has resulted in a 
flurry of research activity to improve its capacity for serving information more effectively. 
But at the heart of these efforts lie implicit assumptions about "quality" and "usefulness" 
of Web resources and services. This observation points towards measurements and 
models that quantify various attributes of web sites. The science of measuring all aspects 
of information, especially its storage and retrieval or ... 

Keywords: Information theoretic, PageRank, Web graph, Web metrics, Web page 
similarity, quality metrics 



9 The Web as a parallel cor pus 
Philip Resnik, Noah A. Smith 

September 2003 Computational Linguistics, volume 29 issue 3 
Publisher: MIT Press 

Full text available: Hi odf(539.83 KB) Additional Information: full citation , abstract, references , citings, index 

terms 

Parallel corpora have become an essential resource for work in multilingual natural 
language processing. In this article, we report on our work using the STRAND system for 
mining parallel text on the World Wide Web,first reviewing the original algorithm and 
results and then presenting a set of significant enhancements. These enhancements 
include the use of supervised learning based on structural features of documents to 
improve classification performance, a new content-based measure of translati ... 

10 Machine learning in automated text categorization 
Fabrizio Sebastiani 

March 2002 ACM Computing Surveys (CSUR), Volume 34 issue l 
Publisher: ACM Press 

Full text available* 1p pdf (524.41 KB ) Add 't' onal Information: full citation , abstract , references , citings , index 
~~~ terms 

The automated categorization (or classification) of texts into predefined categories has 
witnessed a booming interest in the last 10 years, due to the increased availability of 
documents in digital form and the ensuing need to organize them. In the research 
community the dominant approach to this problem is based on machine learning 
techniques: a general inductive process automatically builds a classifier by learning, from 
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a set of preclassified documents, the characteristics of the categories. 
Keywords: Machine learning, text categorization, text classification 



11 Interruptible anytime algorithms for iterative improvement of decision trees 
Saher Esmeir, Shaul Markovitch 

August 2005 Proceedings of the 1st international workshop on Utility-based data 
mining UBDM '05 

Publisher: ACM Press 

Full text available: ^|| pdf(223.84 KB) Additional Information: full citation , abstract , references , index terms 

Finding a minimal decision tree consistent with the examples is an NP-complete problem. 
Therefore, most of the existing algorithms for decision tree induction use a greedy 
approach based on local heuristics. These algorithms usually require a fixed small amount 
of time and result in trees that are not globally optimal. Recently, the LSID3 contract 
anytime algorithm was introduced to allow using extra resources for building better 
decision trees. A contract anytime algorithm needs to get its reso ... 

Keywords: anytime algorithms, anytime learning, cost-quality tradeoff, decision trees, 
hard concepts 



12 Papers: Decision Tree Learning Algorithm with structured attributes: application to 
verbal case frame acquisition 

Hideki Tanaka 

August 1996 Proceedings of the 16th conference on Computational linguistics - 
Volume 2 

Publisher: Association for Computational Linguistics 

Full text available: ||| pdf(558.59 KB) Additional Information: full citation , abstract , references , citings 

The Decision Tree Learning Algorithms (DTLAs) are getting keen attention from the 
natural language processing research community, and there have been a series of 
attempts to apply them to verbal case frame acquisition. However, a DTLA cannot handle 
structured attributes like nouns, which are classified under a thesaurus. In this paper, we 
present a new DTLA that can rationally handle the structured attributes. In the process of 
tree generation, the algorithm generalizes each attribute optimally ... 

13 Data clustering: a review 
A. K. Jain, M. N. Murty, P. J. Flynn 

September 1999 ACM Computing Surveys (CSUR), Volume 31 issue 3 
Publisher: ACM Press 

Full text available: IB pdff636.24 KB) Additjonal Information: full citation , abstract, references, citings, index 
l£ - K ^ terms , review 

Clustering is the unsupervised classification of patterns (observations, data items, or 
feature vectors) into groups (clusters). The clustering problem has been addressed in 
many contexts and by researchers in many disciplines; this reflects its broad appeal and 
usefulness as one of the steps in exploratory data analysis. However, clustering is a 
difficult problem combinatorially, and differences in assumptions and contexts in different 
communities has made the transfer of useful generic co ... 

Keywords: cluster analysis, clustering applications, exploratory data analysis, 
incremental clustering, similarity indices, unsupervised learning 
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Joseph Turian, I. Dan Melamed 

July 2006 Proceedings of the 21st International Conference on Computational 
Linguistics and the 44th annual meeting of the ACL ACL '06 

Publisher: Association for Computational Linguistics 

Full text available: ^ pdf(11 2.02 KB) Additional Information: full citation , abstract , references 

The present work advances the accuracy and training speed of discriminative parsing. Our 
discriminative parsing method has no generative component, yet surpasses a generative 
baseline on constituent parsing, and does so with minimal linguistic cleverness. Our model 
can incorporate arbitrary features of the input and parse state, and performs feature 
selection incrementally over an exponential feature space during training. We 
demonstrate the flexibility of our approach by testing it with several ... 

15 Biclustering Algorithms for Biological Data Analysis: A Survey 
Sara C. Madeira, Arlindo L. Oliveira 
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Most results in the field of algorithm design are single algorithms that solve single 
problems. In this paper we discuss multidimensional divide-and-conquer, an algorithmic 
paradigm that can be instantiated in many different ways to yield a number of algorithms 
and data structures for multidimensional problems. We use this paradigm to give best- 
known solutions to such problems as the ECDF, maxima, range searching, closest pair, 
and all nearest neighbor prob ... 
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The web is now becoming one of the largest information and knowledge repositories. 
Many large scale search engines (Google, Fast, Northern Light, etc.) have emerged to 
help users find information. In this paper, we study how we can effectively use these 
existing search engines to mine the Web and discover the "correct" answers to factual 
natural language questions. We propose a probabilistic algorithm called QASM (Question 
Answering using Statistical Models) that learns the best query para ... 
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The anomaly-detection problem can be formulated as one of learning to characterize the 
behaviors of an individual, system, or network in terms of temporal sequences of discrete 
data. We present an approach on the basis of instance-based learning (IBL) techniques. 
To cast the anomaly-detection task in an IBL framework, we employ an approach that 
transforms temporal sequences of discrete, unordered observations into a metric space 
via a similarity measure that encodes intra-attribute depende ... 
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As one of the most successful applications of image analysis and understanding, face 
recognition has recently received significant attention, especially during the past several 
years. At least two reasons account for this trend: the first is the wide range of 
commercial and law enforcement applications, and the second is the availability of 
feasible technologies after 30 years of research. Even though current machine recognition 
systems have reached a certain level of maturity, their success is ... 
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